A Genetic Approach to Tuning Compact Trie Clustering
نویسندگان
چکیده
The Compact Trie method for document clustering is sensitive to the kind of text it is applied to, but contains various parameters that may be tuned for adaptation to specific applications. We implement a genetic algorithm for optimizing these parameters and apply it to a corpus of texts to demonstrate the feasibility of using genetic algorithms for tuning.
منابع مشابه
Compact trie clustering for overlap detection in news
We investigate document clustering through adaptation of Zamir and Etzioni’s approach to online web document clustering. Specifically we generalize the Suffix Tree Clustering method to allow for a wider range of clustering techniques. We apply the modified technique to a corpus of news articles improving precision by 29% while running 8% faster than the original algorithm.
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملManipulation Control of a Flexible Space Free Flying Robot Using Fuzzy Tuning Approach
Cooperative object manipulation control of rigid-flexible multi-body systems in space is studied in this paper. During such tasks, flexible members like solar panels may get vibrated that in turn may lead to some oscillatory disturbing forces on other subsystems, and consequently produces error in the motion of the end-effectors of the cooperative manipulating arms. Therefore, to design and dev...
متن کامل